Training: https://learn.datacamp.com/projects/nobel-winners
Inspiration: https://www.kaggle.com/kenjee
Documentation: https://seaborn.pydata.org/
pip install geopandas
Collecting geopandas
Downloading geopandas-0.12.1-py3-none-any.whl (1.1 MB)
|████████████████████████████████| 1.1 MB 1.1 MB/s eta 0:00:01
Collecting shapely>=1.7
Downloading Shapely-1.8.5.post1-cp39-cp39-macosx_10_9_x86_64.whl (1.2 MB)
|████████████████████████████████| 1.2 MB 1.8 MB/s eta 0:00:01
Collecting fiona>=1.8
Downloading Fiona-1.8.22-cp39-cp39-macosx_10_10_x86_64.whl (26.5 MB)
|████████████████████████████████| 26.5 MB 7.6 MB/s eta 0:00:01
Requirement already satisfied: pandas>=1.0.0 in /opt/anaconda3/lib/python3.9/site-packages (from geopandas) (1.3.4)
Collecting pyproj>=2.6.1.post1
Downloading pyproj-3.4.0-cp39-cp39-macosx_10_9_x86_64.whl (8.0 MB)
|████████████████████████████████| 8.0 MB 11.2 MB/s eta 0:00:01
Requirement already satisfied: packaging in /opt/anaconda3/lib/python3.9/site-packages (from geopandas) (21.0)
Collecting munch
Downloading munch-2.5.0-py2.py3-none-any.whl (10 kB)
Collecting click-plugins>=1.0
Downloading click_plugins-1.1.1-py2.py3-none-any.whl (7.5 kB)
Requirement already satisfied: setuptools in /opt/anaconda3/lib/python3.9/site-packages (from fiona>=1.8->geopandas) (58.0.4)
Requirement already satisfied: certifi in /opt/anaconda3/lib/python3.9/site-packages (from fiona>=1.8->geopandas) (2022.9.24)
Requirement already satisfied: click>=4.0 in /opt/anaconda3/lib/python3.9/site-packages (from fiona>=1.8->geopandas) (8.0.3)
Requirement already satisfied: six>=1.7 in /opt/anaconda3/lib/python3.9/site-packages (from fiona>=1.8->geopandas) (1.16.0)
Requirement already satisfied: attrs>=17 in /opt/anaconda3/lib/python3.9/site-packages (from fiona>=1.8->geopandas) (21.2.0)
Collecting cligj>=0.5
Downloading cligj-0.7.2-py3-none-any.whl (7.1 kB)
Requirement already satisfied: python-dateutil>=2.7.3 in /opt/anaconda3/lib/python3.9/site-packages (from pandas>=1.0.0->geopandas) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /opt/anaconda3/lib/python3.9/site-packages (from pandas>=1.0.0->geopandas) (2021.3)
Requirement already satisfied: numpy>=1.17.3 in /opt/anaconda3/lib/python3.9/site-packages (from pandas>=1.0.0->geopandas) (1.19.2)
Requirement already satisfied: pyparsing>=2.0.2 in /opt/anaconda3/lib/python3.9/site-packages (from packaging->geopandas) (3.0.4)
Installing collected packages: munch, cligj, click-plugins, shapely, pyproj, fiona, geopandas
Successfully installed click-plugins-1.1.1 cligj-0.7.2 fiona-1.8.22 geopandas-0.12.1 munch-2.5.0 pyproj-3.4.0 shapely-1.8.5.post1
Note: you may need to restart the kernel to use updated packages.
import os # operating system (files)
import numpy as np # Linear algebra
import pandas as pd # Data processing
import geopandas as gpd # Geometry data for plotting data on (world) maps
import seaborn as sns # Data visualization
import matplotlib.pyplot as plt # Data visualization
from matplotlib.ticker import PercentFormatter # Format axis in percentages
from mpl_toolkits.axes_grid1 import make_axes_locatable # Scale axis of (world) maps
df_nobel = pd.read_csv('archive.csv')
display(df_nobel.tail(50))
| Year | Category | Prize | Motivation | Prize Share | Laureate ID | Laureate Type | Full Name | Birth Date | Birth City | Birth Country | Sex | Organization Name | Organization City | Organization Country | Death Date | Death City | Death Country | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 919 | 2013 | Economics | The Sveriges Riksbank Prize in Economic Scienc... | "for their empirical analysis of asset prices" | 1/3 | 895 | Individual | Lars Peter Hansen | 1952-10-26 | Urbana, IL | United States of America | Male | University of Chicago | Chicago, IL | United States of America | NaN | NaN | NaN |
| 920 | 2013 | Economics | The Sveriges Riksbank Prize in Economic Scienc... | "for their empirical analysis of asset prices" | 1/3 | 896 | Individual | Robert J. Shiller | 1946-03-29 | Detroit, MI | United States of America | Male | Yale University | New Haven, CT | United States of America | NaN | NaN | NaN |
| 921 | 2013 | Literature | The Nobel Prize in Literature 2013 | "master of the contemporary short story" | 1/1 | 892 | Individual | Alice Munro | 1931-07-10 | Wingham | Canada | Female | NaN | NaN | NaN | NaN | NaN | NaN |
| 922 | 2013 | Medicine | The Nobel Prize in Physiology or Medicine 2013 | "for their discoveries of machinery regulating... | 1/3 | 884 | Individual | James E. Rothman | 1950-11-03 | Haverhill, MA | United States of America | Male | Yale University | New Haven, CT | United States of America | NaN | NaN | NaN |
| 923 | 2013 | Medicine | The Nobel Prize in Physiology or Medicine 2013 | "for their discoveries of machinery regulating... | 1/3 | 885 | Individual | Randy W. Schekman | 1948-12-30 | St. Paul, MN | United States of America | Male | University of California | Berkeley, CA | United States of America | NaN | NaN | NaN |
| 924 | 2013 | Medicine | The Nobel Prize in Physiology or Medicine 2013 | "for their discoveries of machinery regulating... | 1/3 | 885 | Individual | Randy W. Schekman | 1948-12-30 | St. Paul, MN | United States of America | Male | Howard Hughes Medical Institute | NaN | NaN | NaN | NaN | NaN |
| 925 | 2013 | Medicine | The Nobel Prize in Physiology or Medicine 2013 | "for their discoveries of machinery regulating... | 1/3 | 886 | Individual | Thomas C. Südhof | 1955-12-22 | Göttingen | Germany | Male | Stanford University | Stanford, CA | United States of America | NaN | NaN | NaN |
| 926 | 2013 | Medicine | The Nobel Prize in Physiology or Medicine 2013 | "for their discoveries of machinery regulating... | 1/3 | 886 | Individual | Thomas C. Südhof | 1955-12-22 | Göttingen | Germany | Male | Howard Hughes Medical Institute | NaN | NaN | NaN | NaN | NaN |
| 927 | 2013 | Peace | The Nobel Peace Prize 2013 | "for its extensive efforts to eliminate chemic... | 1/1 | 893 | Organization | Organisation for the Prohibition of Chemical W... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 928 | 2013 | Physics | The Nobel Prize in Physics 2013 | "for the theoretical discovery of a mechanism ... | 1/2 | 887 | Individual | François Englert | 1932-11-06 | Etterbeek | Belgium | Male | Université Libre de Bruxelles | Brussels | Belgium | NaN | NaN | NaN |
| 929 | 2013 | Physics | The Nobel Prize in Physics 2013 | "for the theoretical discovery of a mechanism ... | 1/2 | 888 | Individual | Peter W. Higgs | 1929-05-29 | Newcastle upon Tyne | United Kingdom | Male | University of Edinburgh | Edinburgh | United Kingdom | NaN | NaN | NaN |
| 930 | 2014 | Chemistry | The Nobel Prize in Chemistry 2014 | "for the development of super-resolved fluores... | 1/3 | 909 | Individual | Eric Betzig | 1960-01-13 | Ann Arbor, MI | United States of America | Male | Janelia Research Campus, Howard Hughes Medical... | Ashburn, VA | United States of America | NaN | NaN | NaN |
| 931 | 2014 | Chemistry | The Nobel Prize in Chemistry 2014 | "for the development of super-resolved fluores... | 1/3 | 910 | Individual | Stefan W. Hell | 1962-12-23 | Arad | Romania | Male | Max Planck Institute for Biophysical Chemistry | Göttingen | Germany | NaN | NaN | NaN |
| 932 | 2014 | Chemistry | The Nobel Prize in Chemistry 2014 | "for the development of super-resolved fluores... | 1/3 | 910 | Individual | Stefan W. Hell | 1962-12-23 | Arad | Romania | Male | German Cancer Research Center | Heidelberg | Germany | NaN | NaN | NaN |
| 933 | 2014 | Chemistry | The Nobel Prize in Chemistry 2014 | "for the development of super-resolved fluores... | 1/3 | 911 | Individual | William E. Moerner | 1953-06-24 | Pleasanton, CA | United States of America | Male | Stanford University | Stanford, CA | United States of America | NaN | NaN | NaN |
| 934 | 2014 | Economics | The Sveriges Riksbank Prize in Economic Scienc... | "for his analysis of market power and regulation" | 1/1 | 915 | Individual | Jean Tirole | 1953-08-09 | Troyes | France | Male | Toulouse School of Economics (TSE) | Toulouse | France | NaN | NaN | NaN |
| 935 | 2014 | Literature | The Nobel Prize in Literature 2014 | "for the art of memory with which he has evoke... | 1/1 | 912 | Individual | Patrick Modiano | 1945-07-30 | Paris | France | Male | NaN | NaN | NaN | NaN | NaN | NaN |
| 936 | 2014 | Medicine | The Nobel Prize in Physiology or Medicine 2014 | "for their discoveries of cells that constitut... | 1/2 | 903 | Individual | John O'Keefe | 1939-11-18 | New York, NY | United States of America | Male | University College | London | United Kingdom | NaN | NaN | NaN |
| 937 | 2014 | Medicine | The Nobel Prize in Physiology or Medicine 2014 | "for their discoveries of cells that constitut... | 1/4 | 904 | Individual | May-Britt Moser | 1963-01-04 | Fosnavåg | Norway | Female | Norwegian University of Science and Technology... | Trondheim | Norway | NaN | NaN | NaN |
| 938 | 2014 | Medicine | The Nobel Prize in Physiology or Medicine 2014 | "for their discoveries of cells that constitut... | 1/4 | 905 | Individual | Edvard I. Moser | 1962-04-27 | Ålesund | Norway | Male | Norwegian University of Science and Technology... | Trondheim | Norway | NaN | NaN | NaN |
| 939 | 2014 | Peace | The Nobel Peace Prize 2014 | "for their struggle against the suppression of... | 1/2 | 913 | Individual | Kailash Satyarthi | 1954-01-11 | Vidisha | India | Male | NaN | NaN | NaN | NaN | NaN | NaN |
| 940 | 2014 | Peace | The Nobel Peace Prize 2014 | "for their struggle against the suppression of... | 1/2 | 914 | Individual | Malala Yousafzai | 1997-07-12 | Mingora | Pakistan | Female | NaN | NaN | NaN | NaN | NaN | NaN |
| 941 | 2014 | Physics | The Nobel Prize in Physics 2014 | "for the invention of efficient blue light-emi... | 1/3 | 906 | Individual | Isamu Akasaki | 1929-01-30 | Chiran | Japan | Male | Meijo University | Nagoya | Japan | NaN | NaN | NaN |
| 942 | 2014 | Physics | The Nobel Prize in Physics 2014 | "for the invention of efficient blue light-emi... | 1/3 | 906 | Individual | Isamu Akasaki | 1929-01-30 | Chiran | Japan | Male | Nagoya University | Nagoya | Japan | NaN | NaN | NaN |
| 943 | 2014 | Physics | The Nobel Prize in Physics 2014 | "for the invention of efficient blue light-emi... | 1/3 | 907 | Individual | Hiroshi Amano | 1960-09-11 | Hamamatsu | Japan | Male | Nagoya University | Nagoya | Japan | NaN | NaN | NaN |
| 944 | 2014 | Physics | The Nobel Prize in Physics 2014 | "for the invention of efficient blue light-emi... | 1/3 | 908 | Individual | Shuji Nakamura | 1954-05-22 | Ikata | Japan | Male | University of California | Santa Barbara, CA | United States of America | NaN | NaN | NaN |
| 945 | 2015 | Chemistry | The Nobel Prize in Chemistry 2015 | "for mechanistic studies of DNA repair" | 1/3 | 921 | Individual | Tomas Lindahl | 1938-01-28 | Stockholm | Sweden | Male | Francis Crick Institute | Hertfordshire | United Kingdom | NaN | NaN | NaN |
| 946 | 2015 | Chemistry | The Nobel Prize in Chemistry 2015 | "for mechanistic studies of DNA repair" | 1/3 | 921 | Individual | Tomas Lindahl | 1938-01-28 | Stockholm | Sweden | Male | Clare Hall Laboratory | Hertfordshire | United Kingdom | NaN | NaN | NaN |
| 947 | 2015 | Chemistry | The Nobel Prize in Chemistry 2015 | "for mechanistic studies of DNA repair" | 1/3 | 922 | Individual | Paul Modrich | 1946-06-13 | Raton, NM | United States of America | Male | Howard Hughes Medical Institute | Durham, NC | United States of America | NaN | NaN | NaN |
| 948 | 2015 | Chemistry | The Nobel Prize in Chemistry 2015 | "for mechanistic studies of DNA repair" | 1/3 | 922 | Individual | Paul Modrich | 1946-06-13 | Raton, NM | United States of America | Male | Duke University School of Medicine | Durham, NC | United States of America | NaN | NaN | NaN |
| 949 | 2015 | Chemistry | The Nobel Prize in Chemistry 2015 | "for mechanistic studies of DNA repair" | 1/3 | 923 | Individual | Aziz Sancar | 1946-09-08 | Savur | Turkey | Male | University of North Carolina | Chapel Hill, NC | United States of America | NaN | NaN | NaN |
| 950 | 2015 | Economics | The Sveriges Riksbank Prize in Economic Scienc... | "for his analysis of consumption, poverty, and... | 1/1 | 926 | Individual | Angus Deaton | 1945-10-19 | Edinburgh | United Kingdom | Male | Princeton University | Princeton, NJ | United States of America | NaN | NaN | NaN |
| 951 | 2015 | Literature | The Nobel Prize in Literature 2015 | "for her polyphonic writings, a monument to su... | 1/1 | 924 | Individual | Svetlana Alexievich | 1948-05-31 | Ivano-Frankivsk | Ukraine | Female | NaN | NaN | NaN | NaN | NaN | NaN |
| 952 | 2015 | Medicine | The Nobel Prize in Physiology or Medicine 2015 | "for their discoveries concerning a novel ther... | 1/4 | 916 | Individual | William C. Campbell | 1930-06-28 | Ramelton | Ireland | Male | Drew University | Madison, NJ | United States of America | NaN | NaN | NaN |
| 953 | 2015 | Medicine | The Nobel Prize in Physiology or Medicine 2015 | "for their discoveries concerning a novel ther... | 1/4 | 917 | Individual | Satoshi Ōmura | 1935-07-12 | Yamanashi Prefecture | Japan | Male | Kitasato University | Tokyo | Japan | NaN | NaN | NaN |
| 954 | 2015 | Medicine | The Nobel Prize in Physiology or Medicine 2015 | "for her discoveries concerning a novel therap... | 1/2 | 918 | Individual | Youyou Tu | 1930-12-30 | Zhejiang Ningbo | China | Female | China Academy of Traditional Chinese Medicine | Beijing | China | NaN | NaN | NaN |
| 955 | 2015 | Peace | The Nobel Peace Prize 2015 | "for its decisive contribution to the building... | 1/1 | 925 | Organization | National Dialogue Quartet | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 956 | 2015 | Physics | The Nobel Prize in Physics 2015 | "for the discovery of neutrino oscillations, w... | 1/2 | 919 | Individual | Takaaki Kajita | 1959-03-09 | Higashimatsuyama | Japan | Male | University of Tokyo | Kashiwa | Japan | NaN | NaN | NaN |
| 957 | 2015 | Physics | The Nobel Prize in Physics 2015 | "for the discovery of neutrino oscillations, w... | 1/2 | 920 | Individual | Arthur B. McDonald | 1943-08-29 | Sydney | Canada | Male | Queen's University | Kingston | Canada | NaN | NaN | NaN |
| 958 | 2016 | Chemistry | The Nobel Prize in Chemistry 2016 | "for the design and synthesis of molecular mac... | 1/3 | 931 | Individual | Jean-Pierre Sauvage | 1944-10-21 | Paris | France | Male | University of Strasbourg | Strasbourg | France | NaN | NaN | NaN |
| 959 | 2016 | Chemistry | The Nobel Prize in Chemistry 2016 | "for the design and synthesis of molecular mac... | 1/3 | 932 | Individual | Sir J. Fraser Stoddart | 1942-05-24 | Edinburgh | United Kingdom | Male | Northwestern University | Evanston, IL | United States of America | NaN | NaN | NaN |
| 960 | 2016 | Chemistry | The Nobel Prize in Chemistry 2016 | "for the design and synthesis of molecular mac... | 1/3 | 933 | Individual | Bernard L. Feringa | 1951-05-18 | Barger-Compascuum | Netherlands | Male | University of Groningen | Groningen | Netherlands | NaN | NaN | NaN |
| 961 | 2016 | Economics | The Sveriges Riksbank Prize in Economic Scienc... | "for their contributions to contract theory" | 1/2 | 935 | Individual | Oliver Hart | 1948-10-09 | London | United Kingdom | Male | Harvard University | Cambridge, MA | United States of America | NaN | NaN | NaN |
| 962 | 2016 | Economics | The Sveriges Riksbank Prize in Economic Scienc... | "for their contributions to contract theory" | 1/2 | 936 | Individual | Bengt Holmström | 1949-04-18 | Helsinki | Finland | Male | Massachusetts Institute of Technology (MIT) | Cambridge, MA | United States of America | NaN | NaN | NaN |
| 963 | 2016 | Literature | The Nobel Prize in Literature 2016 | "for having created new poetic expressions wit... | 1/1 | 937 | Individual | Bob Dylan | 1941-05-24 | Duluth, MN | United States of America | Male | NaN | NaN | NaN | NaN | NaN | NaN |
| 964 | 2016 | Medicine | The Nobel Prize in Physiology or Medicine 2016 | "for his discoveries of mechanisms for autophagy" | 1/1 | 927 | Individual | Yoshinori Ohsumi | 1945-02-09 | Fukuoka | Japan | Male | Tokyo Institute of Technology | Tokyo | Japan | NaN | NaN | NaN |
| 965 | 2016 | Peace | The Nobel Peace Prize 2016 | "for his resolute efforts to bring the country... | 1/1 | 934 | Individual | Juan Manuel Santos | 1951-08-10 | Bogotá | Colombia | Male | NaN | NaN | NaN | NaN | NaN | NaN |
| 966 | 2016 | Physics | The Nobel Prize in Physics 2016 | "for theoretical discoveries of topological ph... | 1/2 | 928 | Individual | David J. Thouless | 1934-09-21 | Bearsden | United Kingdom | Male | University of Washington | Seattle, WA | United States of America | NaN | NaN | NaN |
| 967 | 2016 | Physics | The Nobel Prize in Physics 2016 | "for theoretical discoveries of topological ph... | 1/4 | 929 | Individual | F. Duncan M. Haldane | 1951-09-14 | London | United Kingdom | Male | Princeton University | Princeton, NJ | United States of America | NaN | NaN | NaN |
| 968 | 2016 | Physics | The Nobel Prize in Physics 2016 | "for theoretical discoveries of topological ph... | 1/4 | 930 | Individual | J. Michael Kosterlitz | 1943-06-22 | Aberdeen | United Kingdom | Male | Brown University | Providence, RI | United States of America | NaN | NaN | NaN |
def null_count_by_column(df):
"""Lists number of missing values per column if n missing values > 0"""
print(f'DataFrame shape: {df.shape}', end='\n\n')
col_missing_values = (df.isnull().sum()).sort_values(ascending=False)
print(f'DataFrame feature # missing values: \n{col_missing_values[col_missing_values > 0]}')
print(null_count_by_column(df_nobel))
DataFrame shape: (969, 18) DataFrame feature # missing values: Death City 370 Death Country 364 Death Date 352 Organization Country 253 Organization City 253 Organization Name 247 Motivation 88 Birth Date 29 Birth City 28 Birth Country 26 Sex 26 dtype: int64 None
Adding a feature to the dataset indicating the the respective 'Decade' per record based on the 'Year' the Nobel prize was awarded.
df_nobel['Decade'] = df_nobel['Year'].apply(lambda x: np.floor(x / 10) * 10).astype(int)
print(f'Unique values for added Decade in the dataset: {df_nobel.Decade.unique()}')
Unique values for added Decade in the dataset: [1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010]
Adding a feature to the dataset indicating the respective 'Age' per record based on 'Birth Date' and the 'Year' the Nobel prize was awarded. In addition each record is allocated to an 'Age_Group' based on the calculated 'Age'
df_nobel['Birth Date'] = pd.to_datetime(df_nobel['Birth Date'], errors='coerce')
df_nobel['Age'] = df_nobel['Year'] - df_nobel['Birth Date'].dt.year
df_nobel['Age_Group'] = pd.cut(df_nobel['Age'], bins=[0, 18, 30, 64, 99],
labels=['Youth', 'Young Adult', 'Adult', 'Senior'])
print('Relative share of Nobel prize winners per added Age_Group:')
display(df_nobel['Age_Group'].value_counts(normalize=True, sort=False).to_frame())
Relative share of Nobel prize winners per added Age_Group:
| Age_Group | |
|---|---|
| Youth | 0.001066 |
| Young Adult | 0.001066 |
| Adult | 0.655650 |
| Senior | 0.342217 |
Adding a feature to the dataset indicating the respective 'Generation' per record based on 'Birth Date'.
# Source: https://en.wikipedia.org/wiki/Generation#/media/File:Generation_timeline.svg
generations = ['Ancient', 'Lost Generation', 'Greatest Generation', 'Silent Generation', 'Boomers I', 'Boomers II', 'Generation X', 'Millenials (Y)', 'Generation Z', 'Generation Alpha']
age_bins = [min(df_nobel['Birth Date'].dt.year), 1883, 1900, 1927, 1945, 1955, 1965, 1980, 1996, 2012, 2021]
df_nobel['Generation'] = pd.cut(df_nobel['Birth Date'].dt.year, age_bins, labels=generations)
display(df_nobel['Generation'].value_counts(normalize=True, sort=False).to_frame())
| Generation | |
|---|---|
| Ancient | 0.202775 |
| Lost Generation | 0.122732 |
| Greatest Generation | 0.323372 |
| Silent Generation | 0.243330 |
| Boomers I | 0.076841 |
| Boomers II | 0.023479 |
| Generation X | 0.006403 |
| Millenials (Y) | 0.000000 |
| Generation Z | 0.001067 |
| Generation Alpha | 0.000000 |
Individual winners of the nobel prize are identified by 'Laureate ID'. Paul Modrich is listed twice for receiving a nobel prize in the same 'Category' in the same 'Year', listing a different 'Organization Name'. These records of Paul Modrich indicate he was related to 2 organizations when winning the nobel prize, not winning the nobel prize twice. Hence these records will not be considered when analyzing the number of nobel prizes. Marie Curie has been awarded the nobel prize twice, once in 1903 for Physics and once in 1911 for Chemistry. When analyzing the number of nobel prizes we do consider these records as individual awards.
df_nobel_prizes = df_nobel.drop_duplicates(subset=['Year', 'Category', 'Laureate ID'])
print(f'Number of (possibly shared) Nobel Prizes handed out between 1901 and 2016: {len(df_nobel_prizes)}')
Number of (possibly shared) Nobel Prizes handed out between 1901 and 2016: 911
Saving CSV file for data visualization in Tableau
from pathlib import Path
filepath = Path('BigDataProject/out.csv')
filepath.parent.mkdir(parents=True, exist_ok=True)
df_nobel.to_csv(filepath)
!pip install seaborn
Requirement already satisfied: seaborn in /opt/anaconda3/lib/python3.9/site-packages (0.11.2) Requirement already satisfied: matplotlib>=2.2 in /opt/anaconda3/lib/python3.9/site-packages (from seaborn) (3.4.3) Requirement already satisfied: pandas>=0.23 in /opt/anaconda3/lib/python3.9/site-packages (from seaborn) (1.3.4) Requirement already satisfied: numpy>=1.15 in /opt/anaconda3/lib/python3.9/site-packages (from seaborn) (1.19.2) Requirement already satisfied: scipy>=1.0 in /opt/anaconda3/lib/python3.9/site-packages (from seaborn) (1.6.2) Requirement already satisfied: kiwisolver>=1.0.1 in /opt/anaconda3/lib/python3.9/site-packages (from matplotlib>=2.2->seaborn) (1.3.1) Requirement already satisfied: cycler>=0.10 in /opt/anaconda3/lib/python3.9/site-packages (from matplotlib>=2.2->seaborn) (0.10.0) Requirement already satisfied: pillow>=6.2.0 in /opt/anaconda3/lib/python3.9/site-packages (from matplotlib>=2.2->seaborn) (8.4.0) Requirement already satisfied: pyparsing>=2.2.1 in /opt/anaconda3/lib/python3.9/site-packages (from matplotlib>=2.2->seaborn) (3.0.4) Requirement already satisfied: python-dateutil>=2.7 in /opt/anaconda3/lib/python3.9/site-packages (from matplotlib>=2.2->seaborn) (2.8.2) Requirement already satisfied: six in /opt/anaconda3/lib/python3.9/site-packages (from cycler>=0.10->matplotlib>=2.2->seaborn) (1.16.0) Requirement already satisfied: pytz>=2017.3 in /opt/anaconda3/lib/python3.9/site-packages (from pandas>=0.23->seaborn) (2021.3)
import seaborn.apionly as sns
%matplotlib inline
import matplotlib.pyplot as plt
plt.figure(figsize=(10,12))
ChemestryGraph = sns.countplot(y="Birth Country", data=ChemistryDF,
order=ChemistryDF['Birth Country'].value_counts().index,
palette='GnBu_d')
plt.show()
--------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) /var/folders/m1/8dggmjzn5tq4bqm0wf1dq92c0000gn/T/ipykernel_3016/717565735.py in <module> ----> 1 import seaborn.apionly as sns 2 get_ipython().run_line_magic('matplotlib', 'inline') 3 import matplotlib.pyplot as plt 4 5 plt.figure(figsize=(10,12)) ModuleNotFoundError: No module named 'seaborn.apionly'
data=df_nobel
ChemistryDF = data[(data.Category == 'Chemistry')]
EconomicsDF = data[(data.Category == 'Economics')]
LiteratureDF = data[(data.Category == 'Literature')]
MedicineDF = data[(data.Category == 'Medicine')]
PeaceDF = data[(data.Category == 'Peace')]
PhysicsDF = data[(data.Category == 'Physics')]
femaledata=data[(data.Sex == 'Female')]
plt.figure(figsize=(10,12))
FemaleGraph = sns.countplot(y="Birth Country", data=femaledata,
order=femaledata['Birth Country'].value_counts().index,
palette='GnBu_d')
plt.show()
plt.figure(figsize=(10,12))
ChemistryGraph = sns.countplot(y="Birth Country", data=ChemistryDF,
order=ChemistryDF['Birth Country'].value_counts().index,
palette='GnBu_d')
plt.show()
# Mmh, looks like the majority of winners is male, but that there is a slight increase in female laureates
fig, ax = plt.subplots(figsize=(10, 10))
sns.countplot(data=df_nobel_prizes, x='Decade',hue='Sex', ax=ax)
ax.set_title(f'Countplot of the number of Nobel Prizes won in history:')
ax.set_ylabel('Nobel prize count')
ax.legend(loc='upper right')
<matplotlib.legend.Legend at 0x7fd5d92504c0>
# Mmh, looks like the majority of winners is male, but that there is a slight increase in female laureates
fig, ax = plt.subplots(figsize=(10, 10))
sns.countplot(data=data[(data.Sex=="Female")],hue='Sex', x='Decade', ax=ax)
ax.set_title(f'Countplot of the number of Nobel Prizes won in history:')
ax.set_ylabel('Nobel prize count')
Text(0, 0.5, 'Nobel prize count')
# Mmh, looks like the majority of winners is male, but that there is a slight increase in female laureates
fig, ax = plt.subplots(figsize=(10, 10))
sns.countplot(data=data[(data.Sex=="Female")],hue='Category', x='Decade', ax=ax)
ax.set_title(f'Countplot of the number of Nobel Prizes won in history:')
ax.set_ylabel('Nobel prize count')
Text(0, 0.5, 'Nobel prize count')
# The regression plot indeed indicates the share of female winners has increased over the years
pivot = pd.crosstab(df_nobel_prizes['Sex'], df_nobel_prizes['Decade'], values='Laureate ID', aggfunc='count',
normalize='columns')
pivot = pivot.transpose()
fig, ax = plt.subplots(figsize=(10, 10))
sns.regplot(ax=ax, data=pivot, x=pivot.index, y=pivot['Male'], color="lightblue")
sns.regplot(ax=ax, data=pivot, x=pivot.index, y=pivot['Female'], color="pink")
ax.set_title(f'Regression plot of the share of Male/Female Nobel Prizes won in history:')
ax.set_ylabel(f'Share of Nobel Prizes')
ax.yaxis.set_major_formatter(PercentFormatter(1.0))
ax.legend(labels=['Male', 'Female'])
# In absolute numbers the increase in female winners seems modest, where in relative numbers the regression line indicates that:
a, b = np.polyfit(pivot.index, pivot['Female'], 1)
print(f'Over the full history the share of Female Nobel prize winners has increased by {a*100:.1%} per decade.')
Over the full history the share of Female Nobel prize winners has increased by 4.3% per decade.
# Adding the regression lines for the last decades indicates the increase in share of female winners has become steeper
fig, ax = plt.subplots(figsize=(10, 10))
sns.regplot(ax=ax, data=pivot, x=pivot.index, y=pivot['Male'], color="lightblue")
sns.regplot(ax=ax, data=pivot, x=pivot.index, y=pivot['Female'], color="pink")
sns.regplot(ax=ax, data=pivot, x=pivot.index[-4:], y=pivot['Male'][-4:], color="blue")
sns.regplot(ax=ax, data=pivot, x=pivot.index[-4:], y=pivot['Female'][-4:], color="red")
ax.set_title(f'Regression plot of the share of Male/Female Nobel Prizes won over the last decades:')
ax.set_ylabel(f'Share of Nobel Prizes')
ax.yaxis.set_major_formatter(PercentFormatter(1.0))
ax.legend(labels=['Male', 'Female'])
# In absolute numbers the increase in female winners seems modest, where in relative numbers the regression line indicates that:
a, b = np.polyfit(pivot.index[-4:], pivot['Female'][-4:], 1)
print(f'Over the last 4 decades the share of Female Nobel prize winners has increased by {a*100:.1%} per decade.')
Over the last 4 decades the share of Female Nobel prize winners has increased by 19.9% per decade.
# Nobel prizes won by male and female laureates per category
sns.set_style("whitegrid")
fig, ax = plt.subplots(figsize=(10, 10))
sns.countplot(x='Category', data=df_nobel_prizes, hue='Sex', palette={"Male": "lightblue", "Female": "pink"}, ax=ax)
ax.set_title(f'Countplot of the number of Nobel Prizes won per category:')
ax.set_xlabel('Nobel prize category')
ax.set_ylabel('Nobel prize count')
ax.legend(loc='upper right')
<matplotlib.legend.Legend at 0x7f9bb0f99fd0>
# Nobel prizes won by male and female laureates per category
sns.set_style("whitegrid")
fig, ax = plt.subplots(figsize=(10, 10))
sns.countplot(x='Category', data=data[(data.Sex == 'Female')], hue='Sex', palette={"Male": "lightblue", "Female": "pink"}, ax=ax)
ax.set_title(f'Countplot of the number of Nobel Prizes won per category:')
ax.set_xlabel('Nobel prize category')
ax.set_ylabel('Nobel prize count')
ax.legend(loc='upper right')
<matplotlib.legend.Legend at 0x7fd5d9afc160>
# Well in some categories female laureates seem to perform quite well last decade
g = sns.catplot(kind='count', data=df_nobel, x='Decade', hue='Sex', col='Category', col_wrap=3,
palette={"Male": "lightblue", "Female": "pink"})
g.fig.subplots_adjust(top=0.9)
g.fig.suptitle(f'Countplot of the number of Nobel Prizes won in history per category:')
Text(0.5, 0.98, 'Countplot of the number of Nobel Prizes won in history per category:')
# Well in some categories female laureates seem to perform quite well last decade
g = sns.catplot(kind='count', data=data[(data.Sex == 'Female')], x='Decade', hue='Sex', col='Category', col_wrap=3,
palette={"Male": "lightblue", "Female": "pink"})
g.fig.subplots_adjust(top=0.9)
g.fig.suptitle(f'Countplot of the number of Nobel Prizes won in history per category:')
Text(0.5, 0.98, 'Countplot of the number of Nobel Prizes won in history per category:')
g = sns.FacetGrid(df_nobel_prizes, row='Category', height=2, aspect=4)
g.map_dataframe(sns.regplot, x='Year', y='Age', scatter=False, lowess=True, line_kws={'color': 'black'}) # Only Lowess for Male/Female combined
g.map_dataframe(sns.scatterplot, x='Year', y='Age', hue='Age_Group', palette={"Youth": "orange", "Young Adult": "forestgreen", "Adult": "royalblue", "Senior": "lightsteelblue"})
g.add_legend()
<seaborn.axisgrid.FacetGrid at 0x7fd5db55f4c0>
fig, ax = plt.subplots(figsize=(10, 10))
df_nobel_prizes=df_nobel_prizes[(df_nobel_prizes.Sex=='Female')]
sns.regplot(ax=ax, data=df_nobel_prizes, x='Year', y='Age', scatter=False, lowess=True, line_kws={'color': 'black'})
sns.regplot(ax=ax, data=df_nobel_prizes[df_nobel_prizes['Age_Group'] == 'Youth']
, x='Year', y='Age', lowess=True, fit_reg=False, color="orange")
sns.regplot(ax=ax, data=df_nobel_prizes[df_nobel_prizes['Age_Group'] == 'Young Adult']
, x='Year', y='Age', lowess=True, fit_reg=False, color="forestgreen")
sns.regplot(ax=ax, data=df_nobel_prizes[df_nobel_prizes['Age_Group'] == 'Adult']
, x='Year', y='Age', lowess=True, fit_reg=False, color="royalblue")
sns.regplot(ax=ax, data=df_nobel_prizes[df_nobel_prizes['Age_Group'] == 'Senior']
, x='Year', y='Age', lowess=True, fit_reg=False, color="lightsteelblue")
ax.set_title(f'Regression plot of Age in relation to Nobel Prizes won in history:')
ax.legend(labels=['Average Age', 'Youth', 'Young Adult', 'Adult', 'Senior'], loc='upper right')
<matplotlib.legend.Legend at 0x7fd5db538070>
import pandas as pd
!pip install pandas
Requirement already satisfied: pandas in /opt/anaconda3/lib/python3.9/site-packages (1.3.4) Requirement already satisfied: python-dateutil>=2.7.3 in /opt/anaconda3/lib/python3.9/site-packages (from pandas) (2.8.2) Requirement already satisfied: pytz>=2017.3 in /opt/anaconda3/lib/python3.9/site-packages (from pandas) (2021.3) Requirement already satisfied: numpy>=1.17.3 in /opt/anaconda3/lib/python3.9/site-packages (from pandas) (1.19.2) Requirement already satisfied: six>=1.5 in /opt/anaconda3/lib/python3.9/site-packages (from python-dateutil>=2.7.3->pandas) (1.16.0)
plt.figure(figsize=(8,5))
plt.tight_layout()
gb_gender=df_nobel.groupby('Sex')['Laureate ID'].apply(lambda x:len(x.drop_duplicates()))
plt.pie(x=gb_gender,labels=['female','male'],autopct="%.1f%%")
plt.show()
df_nobel.head()
| Year | Category | Prize | Motivation | Prize Share | Laureate ID | Laureate Type | Full Name | Birth Date | Birth City | Birth Country | Sex | Organization Name | Organization City | Organization Country | Death Date | Death City | Death Country | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1901 | Chemistry | The Nobel Prize in Chemistry 1901 | "in recognition of the extraordinary services ... | 1/1 | 160 | Individual | Jacobus Henricus van 't Hoff | 1852-08-30 | Rotterdam | Netherlands | Male | Berlin University | Berlin | Germany | 1911-03-01 | Berlin | Germany |
| 1 | 1901 | Literature | The Nobel Prize in Literature 1901 | "in special recognition of his poetic composit... | 1/1 | 569 | Individual | Sully Prudhomme | 1839-03-16 | Paris | France | Male | NaN | NaN | NaN | 1907-09-07 | Châtenay | France |
| 2 | 1901 | Medicine | The Nobel Prize in Physiology or Medicine 1901 | "for his work on serum therapy, especially its... | 1/1 | 293 | Individual | Emil Adolf von Behring | 1854-03-15 | Hansdorf (Lawice) | Prussia (Poland) | Male | Marburg University | Marburg | Germany | 1917-03-31 | Marburg | Germany |
| 3 | 1901 | Peace | The Nobel Peace Prize 1901 | NaN | 1/2 | 462 | Individual | Jean Henry Dunant | 1828-05-08 | Geneva | Switzerland | Male | NaN | NaN | NaN | 1910-10-30 | Heiden | Switzerland |
| 4 | 1901 | Peace | The Nobel Peace Prize 1901 | NaN | 1/2 | 463 | Individual | Frédéric Passy | 1822-05-20 | Paris | France | Male | NaN | NaN | NaN | 1912-06-12 | Paris | France |
plt.figure(figsize=(12,6))
plt.tight_layout()
gb_cag=df_nobel.groupby('Category')['Laureate ID'].apply(lambda x:len(x.drop_duplicates()))
plt.subplot(121)
plt.pie(x=gb_cag,autopct="%.1f%%",explode=(0,0.1,0.1,0.1,0,0.1),shadow=True)
plt.legend(['chemistry','economics','literature','medicine','peace'])
plt.subplot(122)
gb_cag.sort_values().plot(kind='bar')
plt.show()
df_nobel_prizes = df_nobel.drop_duplicates(subset=['Year', 'Category', 'Laureate ID'])
print(f'Number of (possibly shared) Nobel Prizes handed out between 1901 and 2016: {len(df_nobel_prizes)}')
Number of (possibly shared) Nobel Prizes handed out between 1901 and 2016: 911
import plotly.express as px
organizations = df_nobel['Organization Name'].dropna().unique()
len(organizations)
315
organization_names = df_nobel.groupby('Organization Name')['Organization Name'].count().reset_index(name = 'count').sort_values(by='count', ascending = False)
fig = px.bar(organization_names[0:16], y='Organization Name', x = 'count', color = 'Organization Name')
fig.show()
organization_names.set_index('Organization Name', inplace=True)
cat_org = df_nobel.groupby(['Organization Name', 'Category'])['Organization Name'].count().reset_index(name = 'count').sort_values(by='count', ascending = False)
cat_org['NumberPerOrganization']=0
for org in organizations:
cat_org['NumberPerOrganization'] += (cat_org['Organization Name']==org)*organization_names.loc[org, 'count']
cat_org.sort_values(by=['NumberPerOrganization', 'Organization Name'], ascending = False, inplace=True)
fig = px.bar(cat_org[:53], y = 'Organization Name', x = 'count', color='Category').update_yaxes(categoryorder='total ascending')
fig.show()
import plotly.io as pio
pio.write_html(fig, file= 'index.html', auto_open = True)
fig = px.bar(cat_org, x='count', y = 'Category', color = 'Organization Name')
fig.update_layout(width=1600, height=500)
fig.show()
cat_year = df_nobel.groupby(['Year','Category'])['Category'].count().reset_index(name = 'count')
fig = px.bar(cat_year, x='Year', y = 'count', color = 'Category')
fig.show()
gen_year = df_nobel.groupby(['Year','Sex', 'Category'])['Year'].count().reset_index(name = 'count')
fig = px.bar(gen_year, x='Year', y = 'count', color = 'Sex')
fig.show()
import plotly.graph_objects as go
from plotly.subplots import make_subplots
pie_df1960=df_nobel[df_nobel['Year']<=1960]['Sex'].value_counts().reset_index()
pie_df1960.columns=['sex', 'count']
pie_df1961_pr=df_nobel[df_nobel['Year']>1960]['Sex'].value_counts().reset_index()
pie_df1961_pr.columns=['sex', 'count']
fig = make_subplots(1, 2, specs=[[{'type':'domain'}, {'type':'domain'}]],
subplot_titles=['1901-1960', '1960-2016'])
fig.add_trace(go.Pie(labels=['male', 'female'], values=pie_df1960['count'],
name='Starry Night'), 1, 1)
fig.add_trace(go.Pie(labels=['male', 'female'], values=pie_df1961_pr['count'],
name='Starry Night'), 1, 2)
#fig=px.pie(pie_df1960, values="count", names="sex", title="proportion of genders",color_discrete_sequence=['blue', 'red'])
fig.show()
pie_df_chemistry =df_nobel[df_nobel['Category']=='Chemistry']['Sex'].value_counts().reset_index()
pie_df_literature = df_nobel[df_nobel['Category']=='Literature']['Sex'].value_counts().reset_index()
pie_df_medicine = df_nobel[df_nobel['Category']=='Medicine']['Sex'].value_counts().reset_index()
pie_df_peace = df_nobel[df_nobel['Category']=='Peace']['Sex'].value_counts().reset_index()
pie_df_physics = df_nobel[df_nobel['Category']=='Physics']['Sex'].value_counts().reset_index()
pie_df_economics = df_nobel[df_nobel['Category']=='Economics']['Sex'].value_counts().reset_index()
pie_df_chemistry.columns=['sex', 'count']
pie_df_literature.columns=['sex', 'count']
pie_df_medicine.columns=['sex', 'count']
pie_df_peace.columns=['sex', 'count']
pie_df_physics.columns=['sex', 'count']
pie_df_economics.columns=['sex', 'count']
fig = make_subplots(2, 3, specs=[[{'type':'domain'}, {'type':'domain'}, {'type':'domain'}], [{'type':'domain'}, {'type':'domain'}, {'type':'domain'}]],
vertical_spacing=0.2, horizontal_spacing=0.08, row_heights=[4, 4], subplot_titles=('Chemistry', 'Literature', 'Medicine', 'Peace', 'Physics', 'Economics'))
fig.add_trace(go.Pie(labels=['male', 'female'], values=pie_df_chemistry['count'],
name='Female in Chemistry'), 1, 1)
fig.add_trace(go.Pie(labels=['male', 'female'], values=pie_df_literature['count'], name='Female in Literature'), 1, 2)
fig.add_trace(go.Pie(labels=['male', 'female'], values=pie_df_medicine['count'],
name='Female in Medicine'), 1, 3)
fig.add_trace(go.Pie(labels=['male', 'female'], values=pie_df_peace['count'],
name='Female in Peace'), 2, 1)
fig.add_trace(go.Pie(labels=['male', 'female'], values=pie_df_physics['count'],
name='Female in Physics'), 2, 2)
fig.add_trace(go.Pie(labels=['male', 'female'], values=pie_df_economics['count'],
name='Female in Economics'), 2, 3)
#fig=px.pie(pie_df1960, values="count", names="sex", title="proportion of genders",color_discrete_sequence=['blue', 'red'])
fig.update_traces(textinfo='none')
fig.show()
df_nobel['Birth Date'] = pd.to_datetime(df_nobel['Birth Date'], errors='coerce')
df_nobel['age'] = df_nobel['Year'] - df_nobel['Birth Date'].dt.year
plt.figure(figsize=(15, 7))
sns.swarmplot(x='Sex', y='age',hue = 'Category', dodge=True , data=df_nobel)
plt.ylabel('Age')
plt.xlabel('Gender')
plt.title('Every winner age seperated by gender and prize category')
plt.show()
df_nobel.Year = pd.to_datetime(df_nobel.Year)
df_nobel['winning_age'] = df_nobel.Year - df_nobel["Birth Date"]
with sns.axes_style("whitegrid"):
sns.lmplot(data=df_nobel,
x='Year',
y='winning_age',
hue='Category',
lowess=True,
aspect=2,
scatter_kws = {'alpha': 0.5},
line_kws = {'linewidth': 5})
plt.title('Laureate Age when awarded')
plt.show()
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) /var/folders/m1/8dggmjzn5tq4bqm0wf1dq92c0000gn/T/ipykernel_3016/3393083334.py in <module> 4 5 with sns.axes_style("whitegrid"): ----> 6 sns.lmplot(data=df_nobel, 7 x='Year', 8 y='winning_age', /opt/anaconda3/lib/python3.9/site-packages/seaborn/_decorators.py in inner_f(*args, **kwargs) 44 ) 45 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)}) ---> 46 return f(**kwargs) 47 return inner_f 48 /opt/anaconda3/lib/python3.9/site-packages/seaborn/regression.py in lmplot(x, y, data, hue, col, row, palette, col_wrap, height, aspect, markers, sharex, sharey, hue_order, col_order, row_order, legend, legend_out, x_estimator, x_bins, x_ci, scatter, fit_reg, ci, n_boot, units, seed, order, logistic, lowess, robust, logx, x_partial, y_partial, truncate, x_jitter, y_jitter, scatter_kws, line_kws, facet_kws, size) 632 ax.autoscale_view(scaley=False) 633 --> 634 facets.map_dataframe(update_datalim, x=x, y=y) 635 636 # Draw the regression plot on each facet /opt/anaconda3/lib/python3.9/site-packages/seaborn/axisgrid.py in map_dataframe(self, func, *args, **kwargs) 775 776 # Draw the plot --> 777 self._facet_plot(func, ax, args, kwargs) 778 779 # For axis labels, prefer to use positional args for backcompat /opt/anaconda3/lib/python3.9/site-packages/seaborn/axisgrid.py in _facet_plot(self, func, ax, plot_args, plot_kwargs) 804 plot_args = [] 805 plot_kwargs["ax"] = ax --> 806 func(*plot_args, **plot_kwargs) 807 808 # Sort out the supporting information /opt/anaconda3/lib/python3.9/site-packages/seaborn/regression.py in update_datalim(data, x, y, ax, **kws) 628 629 def update_datalim(data, x, y, ax, **kws): --> 630 xys = np.asarray(data[[x, y]]).astype(float) 631 ax.update_datalim(xys, updatey=False) 632 ax.autoscale_view(scaley=False) TypeError: float() argument must be a string or a number, not 'Timestamp'
df_nobel.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 969 entries, 0 to 968 Data columns (total 19 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Year 969 non-null int64 1 Category 969 non-null object 2 Prize 969 non-null object 3 Motivation 881 non-null object 4 Prize Share 969 non-null object 5 Laureate ID 969 non-null int64 6 Laureate Type 969 non-null object 7 Full Name 969 non-null object 8 Birth Date 938 non-null datetime64[ns] 9 Birth City 941 non-null object 10 Birth Country 943 non-null object 11 Sex 943 non-null object 12 Organization Name 722 non-null object 13 Organization City 716 non-null object 14 Organization Country 716 non-null object 15 Death Date 617 non-null object 16 Death City 599 non-null object 17 Death Country 605 non-null object 18 age 938 non-null float64 dtypes: datetime64[ns](1), float64(1), int64(2), object(15) memory usage: 144.0+ KB
# Define geopandas geometry dataframe:
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world = world[(world.pop_est>0) & (world.name!="Antarctica")] # Reflect countries with population, leaving Antartica out
# Define nobel prize dataframe:
nobel = df_nobel_prizes.drop(['Birth Date', 'Death Date'], axis = 1) # Geopandas conflict with Date format
nobel=nobel[(nobel.Sex == 'Female')]
nobel['Nobel_Country_Count'] = df_nobel_prizes.groupby('Birth Country')['Birth Country'].transform('count') # Derives count of Nobel prizes per Birth Country
# Merge geopandas geometry and nobel prize dataframes
df = pd.merge(nobel, world, how='left', left_on='Birth Country', right_on='name').reset_index()
df_gdf = gpd.GeoDataFrame(df)
# Identify countries of birth for which 'geometry' was not merged
countries_not_reflected = df_gdf[df_gdf['geometry'].isna()]['Birth Country'].unique()
print(f'#{len(countries_not_reflected)} countries are not reflected in the world map as the country name differed over time (example: {countries_not_reflected[0]})')
# Plot world map!
fig, ax = plt.subplots(figsize=(30, 15))
ax.set_title(f'Countplot of the number of Nobel Prizes won per country:')
ax.set_axis_off()
# Format legend
divider = make_axes_locatable(ax)
cax = divider.append_axes("left", size="3%", pad=0.1)
world.plot(ax=ax, color='lightgrey')
df_gdf.plot(ax=ax, column='Nobel_Country_Count',cmap='viridis', legend=True, cax=cax)
plt.show()
#9 countries are not reflected in the world map as the country name differed over time (example: Russian Empire (Poland))
nobel['Nobel_Country_Count'].shape
(49,)
nobel=nobel[(nobel.Sex == 'Female')]
# Define geopandas geometry dataframe:
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world = world[(world.pop_est>0) & (world.name!="Antarctica")] # Reflect countries with population, leaving Antartica out
# Define nobel prize dataframe:
nobel = df_nobel_prizes.drop(['Birth Date', 'Death Date'], axis = 1) # Geopandas conflict with Date format
nobel['Nobel_Country_Count'] = df_nobel_prizes.groupby('Birth Country')['Birth Country'].transform('count') # Derives count of Nobel prizes per Birth Country
# Merge geopandas geometry and nobel prize dataframes
df = pd.merge(nobel, world, how='left', left_on='Birth Country', right_on='name').reset_index()
df_gdf = gpd.GeoDataFrame(df)
# Identify countries of birth for which 'geometry' was not merged
countries_not_reflected = df_gdf[df_gdf['geometry'].isna()]['Birth Country'].unique()
print(f'#{len(countries_not_reflected)} countries are not reflected in the world map as the country name differed over time (example: {countries_not_reflected[0]})')
# Plot world map!
fig, ax = plt.subplots(figsize=(20, 10))
ax.set_title(f'Countplot of the number of Nobel Prizes won per country:')
ax.set_axis_off()
# Format legend
divider = make_axes_locatable(ax)
cax = divider.append_axes("right", size="5%", pad=0.1)
world.plot(ax=ax, color='lightgrey')
df_gdf.plot(ax=ax, column='Nobel_Country_Count',cmap='viridis', legend=True, cax=cax)
plt.show()
#68 countries are not reflected in the world map as the country name differed over time (example: Prussia (Poland))
# The top-10 countries where most Nobel prize laureates where born?
df = df_nobel_prizes['Birth Country'].value_counts().reset_index()
# Determine medal coloring for top-3 countries with respect to Nobel prize winners
colors = {}
for index, row in df.iterrows():
n = row['Birth Country']
if n==df['Birth Country'][0]:
colors[row['index']] = 'gold'
elif n==df['Birth Country'][1]:
colors[row['index']] = 'silver'
elif n==df['Birth Country'][2]:
colors[row['index']] = 'darkgoldenrod'
elif row['index']=='Netherlands':
colors[row['index']] = 'orange'
else:
colors[row['index']] = 'lightblue'
fig, ax = plt.subplots(figsize=(10, 10))
sns.barplot(data=df.head(n=10), x='index', y='Birth Country', palette=colors, ax=ax)
ax.set_title(f'Countplot of the number of Nobel Prizes won by Birth Country:')
ax.set_ylabel('Nobel prize count')
ax.set_xticklabels(ax.get_xticklabels(), rotation=-25)
[Text(0, 0, 'United States of America'), Text(1, 0, 'United Kingdom'), Text(2, 0, 'Germany'), Text(3, 0, 'France'), Text(4, 0, 'Sweden'), Text(5, 0, 'Japan'), Text(6, 0, 'Canada'), Text(7, 0, 'Netherlands'), Text(8, 0, 'Italy'), Text(9, 0, 'Russia')]
def country_rank(df, country):
"""Counts number of Nobel prizes awarded by 'Birth Country'"""
countries = df['Birth Country'].unique()
df = df['Birth Country'].value_counts().reset_index()
df = df[df['index'] == 'Netherlands'].reset_index() # Index number +1 is rank
df.set_axis(['Rank', 'Country', 'Count'], axis=1, inplace=True)
if len(df.index) == 1:
c = df.loc[0, 'Country']
i = df.loc[0, 'Rank']
n = df.loc[0, 'Count']
print(f'Country: {c} is ranked at place {i + 1} with #{n} Nobel prize winners')
else:
print(f'No valid records retrieved for: {country}, please submit any of the following countries: {countries}')
# How does your country rank with respect to Nobel prize laureates?
country_rank(df_nobel_prizes, country='Netherlands')
Country: Netherlands is ranked at place 8 with #18 Nobel prize winners
fig, ax = plt.subplots(figsize=(10, 10))
sns.regplot(ax=ax, data=df_nobel_prizes, x='Year', y='Age', scatter=False, lowess=True, line_kws={'color': 'black'})
sns.regplot(ax=ax, data=df_nobel_prizes[df_nobel_prizes['Age_Group'] == 'Youth']
, x='Year', y='Age', lowess=True, fit_reg=False, color="orange")
sns.regplot(ax=ax, data=df_nobel_prizes[df_nobel_prizes['Age_Group'] == 'Young Adult']
, x='Year', y='Age', lowess=True, fit_reg=False, color="forestgreen")
sns.regplot(ax=ax, data=df_nobel_prizes[df_nobel_prizes['Age_Group'] == 'Adult']
, x='Year', y='Age', lowess=True, fit_reg=False, color="royalblue")
sns.regplot(ax=ax, data=df_nobel_prizes[df_nobel_prizes['Age_Group'] == 'Senior']
, x='Year', y='Age', lowess=True, fit_reg=False, color="lightsteelblue")
ax.set_title(f'Regression plot of Age in relation to Nobel Prizes won in history:')
ax.legend(labels=['Average Age', 'Youth', 'Young Adult', 'Adult', 'Senior'], loc='upper right')
<matplotlib.legend.Legend at 0x7f9b7016b250>
# The trend in age is clearly increasing for nobel prize winners, though we see some differences across the prize categories
g = sns.FacetGrid(df_nobel_prizes, row='Category', height=2, aspect=4)
g.map_dataframe(sns.regplot, x='Year', y='Age', scatter=False, lowess=True, line_kws={'color': 'black'}) # Only Lowess for Male/Female combined
g.map_dataframe(sns.scatterplot, x='Year', y='Age', hue='Age_Group', palette={"Youth": "orange", "Young Adult": "forestgreen", "Adult": "royalblue", "Senior": "lightsteelblue"})
g.add_legend()
<seaborn.axisgrid.FacetGrid at 0x7f9bc02772e0>
# The oldest winner of a Nobel Prize as of 2016
df_nobel_prizes.nlargest(1, "Age")
| Year | Category | Prize | Motivation | Prize Share | Laureate ID | Laureate Type | Full Name | Birth Date | Birth City | ... | Organization Name | Organization City | Organization Country | Death Date | Death City | Death Country | Decade | Age | Age_Group | Generation | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 825 | 2007 | Economics | The Sveriges Riksbank Prize in Economic Scienc... | "for having laid the foundations of mechanism ... | 1/3 | 820 | Individual | Leonid Hurwicz | 1917-08-21 | Moscow | ... | University of Minnesota | Minneapolis, MN | United States of America | 2008-06-24 | Minneapolis, MN | United States of America | 2000 | 90.0 | Senior | Greatest Generation |
1 rows × 22 columns
# The youngest winner of a Nobel Prize as of 2016
df_nobel_prizes.nsmallest(1, "Age")
| Year | Category | Prize | Motivation | Prize Share | Laureate ID | Laureate Type | Full Name | Birth Date | Birth City | ... | Organization Name | Organization City | Organization Country | Death Date | Death City | Death Country | Decade | Age | Age_Group | Generation | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 940 | 2014 | Peace | The Nobel Peace Prize 2014 | "for their struggle against the suppression of... | 1/2 | 914 | Individual | Malala Yousafzai | 1997-07-12 | Mingora | ... | NaN | NaN | NaN | NaN | NaN | NaN | 2010 | 17.0 | Youth | Generation Z |
1 rows × 22 columns
# Lets also have a look at the Nobel laureates per generation
fig, ax = plt.subplots(figsize=(10, 10))
sns.boxplot(ax=ax, data=df_nobel_prizes, x='Year', y='Generation',fliersize=0, palette="hls")
sns.stripplot(ax=ax, data=df_nobel_prizes, x='Year', y='Generation', palette="hls")
ax.set_title(f'Boxplot of Generation in relation to Nobel Prizes won in history:')
Text(0.5, 1.0, 'Boxplot of Generation in relation to Nobel Prizes won in history:')
repeat_laureates = df_nobel_prizes.groupby('Full Name').filter(lambda winner: len(winner) > 1)
display(repeat_laureates[['Full Name', 'Birth Country', 'Laureate Type']].value_counts().to_frame())
| 0 | |||
|---|---|---|---|
| Full Name | Birth Country | Laureate Type | |
| Frederick Sanger | United Kingdom | Individual | 2 |
| John Bardeen | United States of America | Individual | 2 |
| Linus Carl Pauling | United States of America | Individual | 2 |
| Marie Curie, née Sklodowska | Russian Empire (Poland) | Individual | 2 |
plt.figure(figsize= (30,20))
sns.swarmplot(y ="Category", x = "Year", data = df_nobel, hue = "Sex",)
plt.suptitle("Gender distribution of prize winners by year and category", fontsize = 20)
sns.despine(top = True, right = True, left = False, bottom = False)
plt.show()
plt.figure(figsize= (18,5))
sns.swarmplot(y ="Category", x = "Year", data = data[(data.Category == 'Chemistry')], hue = "Sex",)
plt.suptitle("Gender distribution of prize winners by year and category of Chemistry", fontsize = 20)
sns.despine(top = True, right = True, left = False, bottom = False)
plt.show()
plt.figure(figsize= (18,5))
sns.swarmplot(y ="Category", x = "Year", data = data[(data.Category == 'Literature')], hue = "Sex",)
plt.suptitle("Gender distribution of prize winners by year and category of Literature", fontsize = 20)
sns.despine(top = True, right = True, left = False, bottom = False)
plt.show()
plt.figure(figsize= (18,5))
sns.swarmplot(y ="Category", x = "Year", data = data[(data.Category == 'Medicine')], hue = "Sex",)
plt.suptitle("Gender distribution of prize winners by year and category of Medicine", fontsize = 20)
sns.despine(top = True, right = True, left = False, bottom = False)
plt.show()
plt.figure(figsize= (18,5))
sns.swarmplot(y ="Category", x = "Year", data = data[(data.Category == 'Peace')], hue = "Sex",)
plt.suptitle("Gender distribution of prize winners by year and category of Peace", fontsize = 20)
sns.despine(top = True, right = True, left = False, bottom = False)
plt.show()
plt.figure(figsize= (18,5))
sns.swarmplot(y ="Category", x = "Year", data = data[(data.Category == 'Physics')], hue = "Sex",)
plt.suptitle("Gender distribution of prize winners by year and category of Physics", fontsize = 20)
sns.despine(top = True, right = True, left = False, bottom = False)
plt.show()
plt.figure(figsize= (18,5))
sns.swarmplot(y ="Category", x = "Year", data = data[(data.Category == 'Economics')], hue = "Sex",)
plt.suptitle("Gender distribution of prize winners by year and category of Economics", fontsize = 20)
sns.despine(top = True, right = True, left = False, bottom = False)
plt.show()
plt.figure(figsize= (30,20))
sns.swarmplot(y ="Category", x = "Year", data = data[(data.Sex == 'Female')], hue = "Sex",)
plt.suptitle("Gender distribution of prize winners by year and category only Female", fontsize = 20)
sns.despine(top = True, right = True, left = False, bottom = False)
plt.show()
plt.figure(figsize= (30,10))
sns.kdeplot(
data=df, x="Age", hue="Category",
fill=True, common_norm=False, palette="Paired",
alpha=.5, linewidth=0,
)
sns.despine(top = True, right = True, left = False, bottom = False)
plt.suptitle("Age distribution of prize winners by category", fontsize = 20)
plt.show()
pip install bar_chart_race
Collecting bar_chart_race
Downloading bar_chart_race-0.1.0-py3-none-any.whl (156 kB)
|████████████████████████████████| 156 kB 356 kB/s eta 0:00:01
Requirement already satisfied: matplotlib>=3.1 in /opt/anaconda3/lib/python3.9/site-packages (from bar_chart_race) (3.4.3)
Requirement already satisfied: pandas>=0.24 in /opt/anaconda3/lib/python3.9/site-packages (from bar_chart_race) (1.3.4)
Requirement already satisfied: pyparsing>=2.2.1 in /opt/anaconda3/lib/python3.9/site-packages (from matplotlib>=3.1->bar_chart_race) (3.0.4)
Requirement already satisfied: cycler>=0.10 in /opt/anaconda3/lib/python3.9/site-packages (from matplotlib>=3.1->bar_chart_race) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/anaconda3/lib/python3.9/site-packages (from matplotlib>=3.1->bar_chart_race) (1.3.1)
Requirement already satisfied: pillow>=6.2.0 in /opt/anaconda3/lib/python3.9/site-packages (from matplotlib>=3.1->bar_chart_race) (8.4.0)
Requirement already satisfied: numpy>=1.16 in /opt/anaconda3/lib/python3.9/site-packages (from matplotlib>=3.1->bar_chart_race) (1.19.2)
Requirement already satisfied: python-dateutil>=2.7 in /opt/anaconda3/lib/python3.9/site-packages (from matplotlib>=3.1->bar_chart_race) (2.8.2)
Requirement already satisfied: six in /opt/anaconda3/lib/python3.9/site-packages (from cycler>=0.10->matplotlib>=3.1->bar_chart_race) (1.16.0)
Requirement already satisfied: pytz>=2017.3 in /opt/anaconda3/lib/python3.9/site-packages (from pandas>=0.24->bar_chart_race) (2021.3)
Installing collected packages: bar-chart-race
Successfully installed bar-chart-race-0.1.0
Note: you may need to restart the kernel to use updated packages.
df=df_nobel
import bar_chart_race as bcr
bcr.bar_chart_race(
df=df,
filename='nobel_prize.mp4',
orientation='h',
sort='desc',
n_bars=6,
fixed_order=False,
fixed_max=True,
steps_per_period=10,
interpolate_period=False,
label_bars=True,
bar_size=.95,
period_label={'x': .99, 'y': .25, 'ha': 'right', 'va': 'center'},
period_fmt='%B %d, %Y',
period_summary_func=lambda v, r: {'x': .99, 'y': .18,
's': f'Countries: {v.nlargest(6).sum():,.0f}',
'ha': 'right', 'size': 8, 'family': 'Courier New'},
perpendicular_bar_func='median',
period_length=500,
figsize=(5, 3),
dpi=144,
cmap='dark12',
title='Wins by Country',
title_size='',
bar_label_size=7,
tick_label_size=7,
shared_fontdict={'family' : 'Helvetica', 'color' : '.1'},
scale='linear',
writer=None,
fig=None,
bar_kwargs={'alpha': .7},
filter_column_colors=False)
/opt/anaconda3/lib/python3.9/site-packages/bar_chart_race/_make_chart.py:278: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError. Select only valid columns before calling the reduction. max_val = self.df_values.max().max() /opt/anaconda3/lib/python3.9/site-packages/bar_chart_race/_make_chart.py:286: UserWarning: FixedFormatter should only be used together with FixedLocator ax.set_yticklabels(self.df_values.columns) /opt/anaconda3/lib/python3.9/site-packages/bar_chart_race/_make_chart.py:287: UserWarning: FixedFormatter should only be used together with FixedLocator ax.set_xticklabels([max_val] * len(ax.get_xticks())) /opt/anaconda3/lib/python3.9/site-packages/bar_chart_race/_make_chart.py:251: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError. Select only valid columns before calling the reduction. ax.set_xlim(min_val, self.df_values.max().max() * 1.05 * 1.11) MovieWriter ffmpeg unavailable; using Pillow instead.
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) /opt/anaconda3/lib/python3.9/site-packages/matplotlib/animation.py in saving(self, fig, outfile, dpi, *args, **kwargs) 235 try: --> 236 yield self 237 finally: /opt/anaconda3/lib/python3.9/site-packages/matplotlib/animation.py in save(self, filename, writer, fps, dpi, codec, bitrate, extra_args, metadata, extra_anim, savefig_kwargs, progress_callback) 1159 for anim in all_anim: -> 1160 anim._init_draw() # Clear the initial frame 1161 frame_number = 0 /opt/anaconda3/lib/python3.9/site-packages/matplotlib/animation.py in _init_draw(self) 1755 else: -> 1756 self._drawn_artists = self._init_func() 1757 if self._blit: /opt/anaconda3/lib/python3.9/site-packages/bar_chart_race/_make_chart.py in init_func() 419 def init_func(): --> 420 self.plot_bars(0) 421 /opt/anaconda3/lib/python3.9/site-packages/bar_chart_race/_make_chart.py in plot_bars(self, i) 325 bar_location = bar_location[top_filt] --> 326 bar_length = self.df_values.iloc[i].values[top_filt] 327 cols = self.df_values.columns[top_filt] IndexError: boolean index did not match indexed array along dimension 0; dimension is 18 but corresponding boolean dimension is 2 During handling of the above exception, another exception occurred: IndexError Traceback (most recent call last) /opt/anaconda3/lib/python3.9/site-packages/bar_chart_race/_make_chart.py in make_animation(self) 434 else: --> 435 ret_val = anim.save(self.filename, fps=self.fps, writer=self.writer) 436 except Exception as e: /opt/anaconda3/lib/python3.9/site-packages/matplotlib/animation.py in save(self, filename, writer, fps, dpi, codec, bitrate, extra_args, metadata, extra_anim, savefig_kwargs, progress_callback) 1176 frame_number += 1 -> 1177 writer.grab_frame(**savefig_kwargs) 1178 /opt/anaconda3/lib/python3.9/contextlib.py in __exit__(self, typ, value, traceback) 136 try: --> 137 self.gen.throw(typ, value, traceback) 138 except StopIteration as exc: /opt/anaconda3/lib/python3.9/site-packages/matplotlib/animation.py in saving(self, fig, outfile, dpi, *args, **kwargs) 237 finally: --> 238 self.finish() 239 /opt/anaconda3/lib/python3.9/site-packages/matplotlib/animation.py in finish(self) 539 def finish(self): --> 540 self._frames[0].save( 541 self.outfile, save_all=True, append_images=self._frames[1:], IndexError: list index out of range During handling of the above exception, another exception occurred: Exception Traceback (most recent call last) /var/folders/m1/8dggmjzn5tq4bqm0wf1dq92c0000gn/T/ipykernel_3016/4108323816.py in <module> 1 df=df_nobel 2 import bar_chart_race as bcr ----> 3 bcr.bar_chart_race( 4 df=df, 5 filename='nobel_prize.mp4', /opt/anaconda3/lib/python3.9/site-packages/bar_chart_race/_make_chart.py in bar_chart_race(df, filename, orientation, sort, n_bars, fixed_order, fixed_max, steps_per_period, period_length, interpolate_period, label_bars, bar_size, period_label, period_fmt, period_summary_func, perpendicular_bar_func, figsize, cmap, title, title_size, bar_label_size, tick_label_size, shared_fontdict, scale, writer, fig, dpi, bar_kwargs, filter_column_colors) 781 figsize, cmap, title, title_size, bar_label_size, tick_label_size, 782 shared_fontdict, scale, writer, fig, dpi, bar_kwargs, filter_column_colors) --> 783 return bcr.make_animation() 784 785 def load_dataset(name='covid19'): /opt/anaconda3/lib/python3.9/site-packages/bar_chart_race/_make_chart.py in make_animation(self) 444 else: 445 message = str(e) --> 446 raise Exception(message) 447 finally: 448 plt.rcParams = self.orig_rcParams Exception: You do not have ffmpeg installed on your machine. Download ffmpeg from here: https://www.ffmpeg.org/download.html. Matplotlib's original error message below: list index out of range
Full code:
———————————————————————————————
import pandas as pd
import bar_chart_race as bcr
# open csv file from John Hopkins university
df = pd.read_csv('time_series_covid19_confirmed_global.csv')
# remove longitude and latitude values
df = df.drop(columns=["Lat","Long"])
# combine Province/State and Country/Region and make a new column called Location
df['Location'] = df[['Province/State','Country/Region']].apply(lambda x: ', '.join(x.dropna()),axis=1)
# remove Province/State and Country/Region columns
df = df.drop(columns=['Province/State', 'Country/Region'])
# move the combined values to the first column
cols = list(df.columns)
cols = [cols[-1]] + cols[:-1]
df = df[cols]
# transpose the dataframe by flipping the columns and rows
df_transposed = df.T
df_transposed.columns = df_transposed.iloc[0].to_list()
df_transposed = df_transposed.iloc[1:]
df_transposed
# label the index as "Date"
df_transposed.index.names = ['Date']
# specify countries to be included for pre-processing ()
cols = ['Hubei, China','Germany','Spain','United Kingdom','US','India', 'Brazil','Russia','France','Italy']
subset = df_transposed[cols]
# create a new dataframe and make sure all the cells are in a numeric form
cum_sum_df = subset.cumsum(axis=0)
cum_sum_df = cum_sum_df.apply(pd.to_numeric)
# turn index to datetime objects
cum_sum_df.index = pd.to_datetime(cum_sum_df.index)
# plot the racebars
bcr.bar_chart_race(
df=cum_sum_df,
title="COVID-19 Case by country",
filename="covid-19-visualization.mp4",
period_fmt="%b %-d, %Y",
n_bars=8,
steps_per_period=100,
interpolate_period=True
)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
--------------------------------------------------------------------------- NameError Traceback (most recent call last) /var/folders/m1/8dggmjzn5tq4bqm0wf1dq92c0000gn/T/ipykernel_3016/3808815877.py in <module> ----> 1 bin/bash(-c, "$(curl, -fsSL, https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)") NameError: name 'bash' is not defined
USAData = data[data['Birth Country']=='United States of America']
USAData.head()
| Year | Category | Prize | Motivation | Prize Share | Laureate ID | Laureate Type | Full Name | Birth Date | Birth City | ... | Organization Name | Organization City | Organization Country | Death Date | Death City | Death Country | Decade | Age | Age_Group | Generation | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 35 | 1906 | Peace | The Nobel Peace Prize 1906 | NaN | 1/1 | 470 | Individual | Theodore Roosevelt | 1858-10-27 | New York, NY | ... | NaN | NaN | NaN | 1919-01-06 | Oyster Bay, NY | United States of America | 1900 | 48.0 | Adult | Ancient |
| 73 | 1912 | Peace | The Nobel Peace Prize 1912 | NaN | 1/1 | 480 | Individual | Elihu Root | 1845-02-15 | Clinton, NY | ... | NaN | NaN | NaN | 1937-02-07 | New York, NY | United States of America | 1910 | 67.0 | Senior | Ancient |
| 80 | 1914 | Chemistry | The Nobel Prize in Chemistry 1914 | "in recognition of his accurate determinations... | 1/1 | 175 | Individual | Theodore William Richards | 1868-01-31 | Germantown, PA | ... | Harvard University | Cambridge, MA | United States of America | 1928-04-02 | Cambridge, MA | United States of America | 1910 | 46.0 | Adult | Ancient |
| 96 | 1919 | Peace | The Nobel Peace Prize 1919 | NaN | 1/1 | 483 | Individual | Thomas Woodrow Wilson | 1856-12-28 | Staunton, VA | ... | NaN | NaN | NaN | 1924-02-03 | Washington, DC | United States of America | 1910 | 63.0 | Adult | Ancient |
| 118 | 1923 | Physics | The Nobel Prize in Physics 1923 | "for his work on the elementary charge of elec... | 1/1 | 28 | Individual | Robert Andrews Millikan | 1868-03-22 | Morrison, IL | ... | California Institute of Technology (Caltech) | Pasadena, CA | United States of America | 1953-12-19 | San Marino, CA | United States of America | 1920 | 55.0 | Adult | Ancient |
5 rows × 22 columns
import altair as alt
from vega_datasets import data
counties = alt.topo_feature(data.us_10m.url, USAData['Birth City'])
source = USAData
alt.Chart(counties).mark_geoshape().encode( color='rate:Q').transform_lookup(
lookup='id',
from_=alt.LookupData(source, 'id', ['rate'])
).project(
type='albersUsa'
).properties(" width=500,height=300"
)
--------------------------------------------------------------------------- SchemaValidationError Traceback (most recent call last) /var/folders/m1/8dggmjzn5tq4bqm0wf1dq92c0000gn/T/ipykernel_3016/2890461058.py in <module> 2 from vega_datasets import data 3 ----> 4 counties = alt.topo_feature(data.us_10m.url, USAData['Birth City']) 5 source = USAData 6 /opt/anaconda3/lib/python3.9/site-packages/altair/vegalite/v4/api.py in topo_feature(url, feature, **kwargs) 2465 """ 2466 return core.UrlData( -> 2467 url=url, format=core.TopoDataFormat(type="topojson", feature=feature, **kwargs) 2468 ) 2469 /opt/anaconda3/lib/python3.9/site-packages/altair/vegalite/v4/schema/core.py in __init__(self, feature, mesh, parse, type, **kwds) 18339 18340 def __init__(self, feature=Undefined, mesh=Undefined, parse=Undefined, type=Undefined, **kwds): > 18341 super(TopoDataFormat, self).__init__(feature=feature, mesh=mesh, parse=parse, type=type, **kwds) 18342 18343 /opt/anaconda3/lib/python3.9/site-packages/altair/vegalite/v4/schema/core.py in __init__(self, *args, **kwds) 3563 3564 def __init__(self, *args, **kwds): -> 3565 super(DataFormat, self).__init__(*args, **kwds) 3566 3567 /opt/anaconda3/lib/python3.9/site-packages/altair/utils/schemapi.py in __init__(self, *args, **kwds) 175 176 if DEBUG_MODE and self._class_is_valid_at_instantiation: --> 177 self.to_dict(validate=True) 178 179 def copy(self, deep=True, ignore=()): /opt/anaconda3/lib/python3.9/site-packages/altair/utils/schemapi.py in to_dict(self, validate, ignore, context) 338 self.validate(result) 339 except jsonschema.ValidationError as err: --> 340 raise SchemaValidationError(self, err) 341 return result 342 SchemaValidationError: Invalid specification altair.vegalite.v4.schema.core.TopoDataFormat->feature, validating 'type' {35: 'New York, NY', 73: 'Clinton, NY', 80: 'Germantown, PA', 96: 'Staunton, VA', 118: 'Morrison, IL', 125: 'Marietta, OH', 139: 'Wooster, OH', 150: 'Potsdam, NY', 153: 'Sauk Centre, MN', 163: 'Cedarville, IL', 164: 'Elizabeth, NJ', 165: 'Brooklyn, NY', 171: 'Lexington, KY', 175: 'Walkerton, IN', 177: 'Ashland, NH', 178: 'Boston, MA', 179: 'Stoughton, WI', 180: 'Stoughton, WI', 189: 'New York, NY', 194: 'New York, NY', 200: 'Bloomington, IL', 204: 'Hillsboro, WV', 213: 'Canton, SD', 216: 'Hume, IL', 220: 'San Francisco, CA', 221: 'Platteville, WI', 229: 'Olympus, TN', 231: 'Canton, MA', 232: 'Yonkers, NY', 233: 'Ridgeville, IN', 235: 'New York, NY', 236: 'Jamaica Plain, MA (Boston)', 237: 'Livingston Manor, NY', 238: 'Cambridge, MA', 248: 'St. Louis, MO', 252: 'New Albany, MS', 262: 'South Norwalk, CT', 264: 'Pittsburgh, PA', 265: 'Detroit, MI', 267: 'Redondo Beach, CA', 268: 'Ishpeming, MI', 280: 'Taylorville, IL', 287: 'Uniontown, PA', 289: 'Portland, OR', 290: 'Oak Park, IL', 291: 'West Hartford, CT', 292: 'West Hartford, CT', 293: 'Ann Arbor, MI', 294: 'Auburn, AL', 299: 'Chicago, IL', 302: 'Los Angeles, CA', 310: 'Orange, NJ', 312: 'Madison, WI', 322: 'Wahoo, NE', 323: 'Boulder, CO', 324: 'Montclair, NJ', 334: 'Brooklyn, NY', 337: 'San Francisco, CA', 338: 'Grand Valley, CO', 343: 'Cleveland, OH', 344: 'St. Paul, MN', 348: 'New York, NY', 353: 'Salinas, CA', 355: 'Chicago, IL', 357: 'Portland, OR', 374: 'Atlanta, GA', 375: 'Greenville, SC', 378: 'Boston, MA', 385: 'New York, NY', 386: 'New York, NY', 387: 'Newburyport, MA', 390: 'Baltimore, MD', 398: 'Bloomsburg, PA', 399: 'New York, NY', 403: 'Urbana, IL', 405: 'New York, NY', 407: 'San Francisco, CA', 414: 'Owosso, MI', 417: 'New York, NY', 419: 'Gary, IN', 423: 'New York, NY', 424: 'Cresco, IA', 430: 'Burlingame, KS', 433: 'Monessen, PA', 434: 'Chicago, IL', 435: 'New York, NY', 437: 'New York, NY', 439: 'New York, NY', 441: 'Madison, WI', 442: 'New York, NY', 443: 'Oak Park, IL', 456: 'Sterling, IL', 474: 'New York, NY', 476: 'Philadelphia, PA', 479: 'Chicago, IL', 480: 'Council, ID', 481: 'Cleveland, OH', 482: 'Brooklyn, NY', 484: 'New York, NY', 485: 'Yonkers, NY', 488: 'Brooklyn, NY', 489: 'Ann Arbor, MI', 497: 'New York, NY', 499: 'Indianapolis, IN', 501: 'Middletown, CT', 503: 'Milwaukee, WI', 506: 'Wilmington, DE', 507: 'New York, NY', 512: 'Houston, TX', 515: 'Arlington, SD', 521: 'New York, NY', 524: 'New York, NY', 526: 'New York, NY', 527: 'Boston, MA', 528: 'Omaha, NE', 532: 'Bradford, MA', 534: 'Chicago, IL', 535: 'Merriman, NE', 538: 'Champaign, IL', 540: 'Hartford, CT', 545: 'Mount Verno, NY', 548: 'Renton, WA', 555: 'Waltham, MA', 559: 'Hartford, CT', 562: 'Pittsburgh, PA', 563: 'Fort Worth, TX', 572: 'New York, NY', 573: 'New York, NY', 576: 'New York, NY', 577: 'Sumter, SC', 580: 'San José, CA', 583: 'Murfreesboro, TN', 585: 'Brooklyn, NY', 591: 'Chester, VT', 595: 'Brooklyn, NY', 608: 'New York, NY', 609: 'Hoquiam, WA', 611: 'New York, NY', 612: 'New York, NY', 615: 'Chicago, IL', 618: 'York, PA', 619: 'Oceanside, NY', 621: 'Washington, DC', 624: 'Methuen, MA', 625: 'Chicago, IL', 626: 'Boston, MA', 627: 'Boston, MA', 629: 'Milford, MA', 630: 'Mart, TX', 632: 'Chicago, IL', 633: 'Boston, MA', 643: 'Pottsville, PA', 646: 'Lansing, IA', 650: 'Lenoir, NC', 652: 'New York, NY', 653: 'Cambridge, MA', 654: 'Lorain, OH', 656: 'Falmouth, KY', 659: 'New York, NY', 660: 'Philadelphia, PA', 663: 'Bluefield, WV', 666: 'New Haven, CT', 667: 'Baltimore, MD', 672: 'Pittsburgh, PA', 675: 'Delaware, OH', 676: 'Yakima, WA', 678: 'Wilkes-Barre, PA', 680: 'South Bend, IN', 683: 'New York, NY', 684: 'Paterson, NJ', 685: 'Alice, TX', 687: 'Akron, OH', 695: 'Rye, NY', 696: 'Aberdeen, WA', 697: 'Washington, DC', 698: 'Provo, UT', 701: 'New York, NY', 704: 'Des Moines, IA', 706: 'Putney, VT', 707: 'St. Louis, MO', 710: 'Wilkes-Barre, PA', 715: 'Charleston, SC', 716: 'Brooklyn, NY', 717: 'Whiting, IN', 720: 'Visalia, CA', 731: 'Sioux City, IA', 734: 'Chicago, IL', 735: 'Raleigh, NC', 738: 'New York, NY', 743: 'Jefferson City, MO', 744: 'Taunton, MA', 746: 'Philadelphia, PA', 747: 'New Haven, CT', 748: 'Montclair, NJ', 749: 'Gary, IN', 751: 'Los Angeles, CA', 756: 'Palo Alto, CA', 758: 'Corvallis, OR', 759: 'New York, NY', 764: 'Wichita, KS', 767: 'Chicago, IL', 769: 'Plains, GA', 770: 'Washington, DC', 773: 'Northfield, MN', 774: 'Burlington, MA', 775: 'Burlington, MA', 776: 'Syracuse, NY', 779: 'Sidney, OH', 787: 'Brooklyn, NY', 790: 'Glens Falls, NY', 791: 'Glens Falls, NY', 793: 'New York, NY', 794: 'Seattle, WA', 796: 'Washington, DC', 797: 'New York, NY', 798: 'New York, NY', 800: 'Possum Trot, KY', 801: 'Berne, IN', 803: 'Oakland, CA', 810: 'New York, NY', 811: 'Denver, CO', 812: 'Denver, CO', 815: 'St. Louis, MO', 816: 'Evanston, IL', 818: 'Stanford, CA', 819: 'New Haven, CT', 822: 'Roanoke, VA', 823: 'Yukon, FL', 826: 'New York, NY', 827: 'Boston, MA', 834: 'Washington, DC', 840: 'Chicago, IL', 841: 'New York, NY', 842: 'New York, NY', 843: 'New York, NY', 854: 'Milwaukee, WI', 855: 'Milwaukee, WI', 857: 'Los Angeles, CA', 858: 'Los Angeles, CA', 859: 'Superior, WI', 862: 'San Diego, CA', 866: 'Honolulu, HI', 870: 'White Plains, NY', 871: 'Springfield, MA', 874: 'New York, NY', 875: 'Enterprise, OR', 876: 'Enterprise, OR', 884: 'Pasadena, CA', 885: 'Washington, DC', 887: 'Chicago, IL', 888: 'Chicago, IL', 894: 'Champaign-Urbana, IL', 895: 'Champaign-Urbana, IL', 896: 'Missoula, MT', 897: 'Washington, DC', 898: 'Washington, DC', 899: 'New York, NY', 900: 'New York, NY', 901: 'Little Falls, MN', 902: 'New York, NY', 903: 'New York, NY', 904: 'Cambridge, MA', 912: 'Milwaukee, WI', 913: 'Milwaukee, WI', 918: 'Boston, MA', 919: 'Urbana, IL', 920: 'Detroit, MI', 922: 'Haverhill, MA', 923: 'St. Paul, MN', 924: 'St. Paul, MN', 930: 'Ann Arbor, MI', 933: 'Pleasanton, CA', 936: 'New York, NY', 947: 'Raton, NM', 948: 'Raton, NM', 963: 'Duluth, MN'} is not of type 'string'
import altair as alt
from vega_datasets import data
counties = alt.topo_feature(data.us_10m.url, USAData['Birth City'])
source = data.unemployment.url
alt.Chart(counties).mark_geoshape().encode(
color='rate:Q'
).transform_lookup(
lookup='id',
from_=alt.LookupData(source, 'id', ['rate'])
).project(
type='albersUsa'
).properties(
width=500,
height=300
)
--------------------------------------------------------------------------- SchemaValidationError Traceback (most recent call last) /var/folders/m1/8dggmjzn5tq4bqm0wf1dq92c0000gn/T/ipykernel_3016/73697953.py in <module> 2 from vega_datasets import data 3 ----> 4 counties = alt.topo_feature(data.us_10m.url, USAData['Birth City']) 5 source = data.unemployment.url 6 /opt/anaconda3/lib/python3.9/site-packages/altair/vegalite/v4/api.py in topo_feature(url, feature, **kwargs) 2465 """ 2466 return core.UrlData( -> 2467 url=url, format=core.TopoDataFormat(type="topojson", feature=feature, **kwargs) 2468 ) 2469 /opt/anaconda3/lib/python3.9/site-packages/altair/vegalite/v4/schema/core.py in __init__(self, feature, mesh, parse, type, **kwds) 18339 18340 def __init__(self, feature=Undefined, mesh=Undefined, parse=Undefined, type=Undefined, **kwds): > 18341 super(TopoDataFormat, self).__init__(feature=feature, mesh=mesh, parse=parse, type=type, **kwds) 18342 18343 /opt/anaconda3/lib/python3.9/site-packages/altair/vegalite/v4/schema/core.py in __init__(self, *args, **kwds) 3563 3564 def __init__(self, *args, **kwds): -> 3565 super(DataFormat, self).__init__(*args, **kwds) 3566 3567 /opt/anaconda3/lib/python3.9/site-packages/altair/utils/schemapi.py in __init__(self, *args, **kwds) 175 176 if DEBUG_MODE and self._class_is_valid_at_instantiation: --> 177 self.to_dict(validate=True) 178 179 def copy(self, deep=True, ignore=()): /opt/anaconda3/lib/python3.9/site-packages/altair/utils/schemapi.py in to_dict(self, validate, ignore, context) 338 self.validate(result) 339 except jsonschema.ValidationError as err: --> 340 raise SchemaValidationError(self, err) 341 return result 342 SchemaValidationError: Invalid specification altair.vegalite.v4.schema.core.TopoDataFormat->feature, validating 'type' {35: 'New York, NY', 73: 'Clinton, NY', 80: 'Germantown, PA', 96: 'Staunton, VA', 118: 'Morrison, IL', 125: 'Marietta, OH', 139: 'Wooster, OH', 150: 'Potsdam, NY', 153: 'Sauk Centre, MN', 163: 'Cedarville, IL', 164: 'Elizabeth, NJ', 165: 'Brooklyn, NY', 171: 'Lexington, KY', 175: 'Walkerton, IN', 177: 'Ashland, NH', 178: 'Boston, MA', 179: 'Stoughton, WI', 180: 'Stoughton, WI', 189: 'New York, NY', 194: 'New York, NY', 200: 'Bloomington, IL', 204: 'Hillsboro, WV', 213: 'Canton, SD', 216: 'Hume, IL', 220: 'San Francisco, CA', 221: 'Platteville, WI', 229: 'Olympus, TN', 231: 'Canton, MA', 232: 'Yonkers, NY', 233: 'Ridgeville, IN', 235: 'New York, NY', 236: 'Jamaica Plain, MA (Boston)', 237: 'Livingston Manor, NY', 238: 'Cambridge, MA', 248: 'St. Louis, MO', 252: 'New Albany, MS', 262: 'South Norwalk, CT', 264: 'Pittsburgh, PA', 265: 'Detroit, MI', 267: 'Redondo Beach, CA', 268: 'Ishpeming, MI', 280: 'Taylorville, IL', 287: 'Uniontown, PA', 289: 'Portland, OR', 290: 'Oak Park, IL', 291: 'West Hartford, CT', 292: 'West Hartford, CT', 293: 'Ann Arbor, MI', 294: 'Auburn, AL', 299: 'Chicago, IL', 302: 'Los Angeles, CA', 310: 'Orange, NJ', 312: 'Madison, WI', 322: 'Wahoo, NE', 323: 'Boulder, CO', 324: 'Montclair, NJ', 334: 'Brooklyn, NY', 337: 'San Francisco, CA', 338: 'Grand Valley, CO', 343: 'Cleveland, OH', 344: 'St. Paul, MN', 348: 'New York, NY', 353: 'Salinas, CA', 355: 'Chicago, IL', 357: 'Portland, OR', 374: 'Atlanta, GA', 375: 'Greenville, SC', 378: 'Boston, MA', 385: 'New York, NY', 386: 'New York, NY', 387: 'Newburyport, MA', 390: 'Baltimore, MD', 398: 'Bloomsburg, PA', 399: 'New York, NY', 403: 'Urbana, IL', 405: 'New York, NY', 407: 'San Francisco, CA', 414: 'Owosso, MI', 417: 'New York, NY', 419: 'Gary, IN', 423: 'New York, NY', 424: 'Cresco, IA', 430: 'Burlingame, KS', 433: 'Monessen, PA', 434: 'Chicago, IL', 435: 'New York, NY', 437: 'New York, NY', 439: 'New York, NY', 441: 'Madison, WI', 442: 'New York, NY', 443: 'Oak Park, IL', 456: 'Sterling, IL', 474: 'New York, NY', 476: 'Philadelphia, PA', 479: 'Chicago, IL', 480: 'Council, ID', 481: 'Cleveland, OH', 482: 'Brooklyn, NY', 484: 'New York, NY', 485: 'Yonkers, NY', 488: 'Brooklyn, NY', 489: 'Ann Arbor, MI', 497: 'New York, NY', 499: 'Indianapolis, IN', 501: 'Middletown, CT', 503: 'Milwaukee, WI', 506: 'Wilmington, DE', 507: 'New York, NY', 512: 'Houston, TX', 515: 'Arlington, SD', 521: 'New York, NY', 524: 'New York, NY', 526: 'New York, NY', 527: 'Boston, MA', 528: 'Omaha, NE', 532: 'Bradford, MA', 534: 'Chicago, IL', 535: 'Merriman, NE', 538: 'Champaign, IL', 540: 'Hartford, CT', 545: 'Mount Verno, NY', 548: 'Renton, WA', 555: 'Waltham, MA', 559: 'Hartford, CT', 562: 'Pittsburgh, PA', 563: 'Fort Worth, TX', 572: 'New York, NY', 573: 'New York, NY', 576: 'New York, NY', 577: 'Sumter, SC', 580: 'San José, CA', 583: 'Murfreesboro, TN', 585: 'Brooklyn, NY', 591: 'Chester, VT', 595: 'Brooklyn, NY', 608: 'New York, NY', 609: 'Hoquiam, WA', 611: 'New York, NY', 612: 'New York, NY', 615: 'Chicago, IL', 618: 'York, PA', 619: 'Oceanside, NY', 621: 'Washington, DC', 624: 'Methuen, MA', 625: 'Chicago, IL', 626: 'Boston, MA', 627: 'Boston, MA', 629: 'Milford, MA', 630: 'Mart, TX', 632: 'Chicago, IL', 633: 'Boston, MA', 643: 'Pottsville, PA', 646: 'Lansing, IA', 650: 'Lenoir, NC', 652: 'New York, NY', 653: 'Cambridge, MA', 654: 'Lorain, OH', 656: 'Falmouth, KY', 659: 'New York, NY', 660: 'Philadelphia, PA', 663: 'Bluefield, WV', 666: 'New Haven, CT', 667: 'Baltimore, MD', 672: 'Pittsburgh, PA', 675: 'Delaware, OH', 676: 'Yakima, WA', 678: 'Wilkes-Barre, PA', 680: 'South Bend, IN', 683: 'New York, NY', 684: 'Paterson, NJ', 685: 'Alice, TX', 687: 'Akron, OH', 695: 'Rye, NY', 696: 'Aberdeen, WA', 697: 'Washington, DC', 698: 'Provo, UT', 701: 'New York, NY', 704: 'Des Moines, IA', 706: 'Putney, VT', 707: 'St. Louis, MO', 710: 'Wilkes-Barre, PA', 715: 'Charleston, SC', 716: 'Brooklyn, NY', 717: 'Whiting, IN', 720: 'Visalia, CA', 731: 'Sioux City, IA', 734: 'Chicago, IL', 735: 'Raleigh, NC', 738: 'New York, NY', 743: 'Jefferson City, MO', 744: 'Taunton, MA', 746: 'Philadelphia, PA', 747: 'New Haven, CT', 748: 'Montclair, NJ', 749: 'Gary, IN', 751: 'Los Angeles, CA', 756: 'Palo Alto, CA', 758: 'Corvallis, OR', 759: 'New York, NY', 764: 'Wichita, KS', 767: 'Chicago, IL', 769: 'Plains, GA', 770: 'Washington, DC', 773: 'Northfield, MN', 774: 'Burlington, MA', 775: 'Burlington, MA', 776: 'Syracuse, NY', 779: 'Sidney, OH', 787: 'Brooklyn, NY', 790: 'Glens Falls, NY', 791: 'Glens Falls, NY', 793: 'New York, NY', 794: 'Seattle, WA', 796: 'Washington, DC', 797: 'New York, NY', 798: 'New York, NY', 800: 'Possum Trot, KY', 801: 'Berne, IN', 803: 'Oakland, CA', 810: 'New York, NY', 811: 'Denver, CO', 812: 'Denver, CO', 815: 'St. Louis, MO', 816: 'Evanston, IL', 818: 'Stanford, CA', 819: 'New Haven, CT', 822: 'Roanoke, VA', 823: 'Yukon, FL', 826: 'New York, NY', 827: 'Boston, MA', 834: 'Washington, DC', 840: 'Chicago, IL', 841: 'New York, NY', 842: 'New York, NY', 843: 'New York, NY', 854: 'Milwaukee, WI', 855: 'Milwaukee, WI', 857: 'Los Angeles, CA', 858: 'Los Angeles, CA', 859: 'Superior, WI', 862: 'San Diego, CA', 866: 'Honolulu, HI', 870: 'White Plains, NY', 871: 'Springfield, MA', 874: 'New York, NY', 875: 'Enterprise, OR', 876: 'Enterprise, OR', 884: 'Pasadena, CA', 885: 'Washington, DC', 887: 'Chicago, IL', 888: 'Chicago, IL', 894: 'Champaign-Urbana, IL', 895: 'Champaign-Urbana, IL', 896: 'Missoula, MT', 897: 'Washington, DC', 898: 'Washington, DC', 899: 'New York, NY', 900: 'New York, NY', 901: 'Little Falls, MN', 902: 'New York, NY', 903: 'New York, NY', 904: 'Cambridge, MA', 912: 'Milwaukee, WI', 913: 'Milwaukee, WI', 918: 'Boston, MA', 919: 'Urbana, IL', 920: 'Detroit, MI', 922: 'Haverhill, MA', 923: 'St. Paul, MN', 924: 'St. Paul, MN', 930: 'Ann Arbor, MI', 933: 'Pleasanton, CA', 936: 'New York, NY', 947: 'Raton, NM', 948: 'Raton, NM', 963: 'Duluth, MN'} is not of type 'string'